Skip to content

Conversation

namya28
Copy link
Contributor

@namya28 namya28 commented Oct 17, 2025

Description

Elasticsearch only supports Mixedcase support for column names and schema names.

Index Names in Elasticsearch is mapped as Table names for Elasticsearch connector in Presto. As per the official documentation , the index name should be in lowercase only , therefore , Uppercase and Mixed case will not be supported.
https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-create#operation-indices-create-path

Motivation and Context

Impact

Test Plan

Test cases have been introduced and are passing successfully.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Elasticsearch Connector Changes
* Add mixed case support for Elasticsearch connector 

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 17, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 17, 2025

Reviewer's Guide

Enable configurable case-sensitive name matching in the Elasticsearch connector by introducing a new flag in configuration, updating metadata resolution to normalize identifiers based on this flag, and adding supporting tests and test harness adjustments.

Sequence diagram for identifier normalization in metadata resolution

sequenceDiagram
participant "ConnectorSession"
participant "ElasticsearchMetadata"
participant "ElasticsearchConfig"

"ConnectorSession"->>"ElasticsearchMetadata": Request table/column metadata
"ElasticsearchMetadata"->>"ElasticsearchConfig": Check caseSensitiveNameMatching
"ElasticsearchMetadata"->>"ElasticsearchMetadata": normalizeIdentifier(session, identifier)
"ElasticsearchMetadata"-->>"ConnectorSession": Return normalized metadata
Loading

Class diagram for updated ElasticsearchConfig and ElasticsearchMetadata

classDiagram
class ElasticsearchConfig {
  - boolean ignorePublishAddress
  - boolean verifyHostnames
  - Security security
  + boolean caseSensitiveNameMatching
  + boolean isCaseSensitiveNameMatching()
  + ElasticsearchConfig setCaseSensitiveNameMatching(boolean)
}

class ElasticsearchMetadata {
  - ElasticsearchClient client
  - String schemaName
  - Type ipAddressType
  + boolean caseSensitiveNameMatching
  + String normalizeIdentifier(ConnectorSession, String)
}

ElasticsearchMetadata --> ElasticsearchConfig: uses
Loading

File-Level Changes

Change Details Files
Implement case-sensitive name matching logic in connector metadata
  • Introduce caseSensitiveNameMatching flag in ElasticsearchMetadata and wire it through constructor
  • Override normalizeIdentifier to use the flag for lowercasing or preserving case
  • Refactor getColumnFields to accept session and use normalizeIdentifier for grouping and filtering
presto-elasticsearch/src/main/java/com/facebook/presto/elasticsearch/ElasticsearchMetadata.java
Add and test new case-sensitive name matching configuration
  • Add caseSensitiveNameMatching property with getter, setter annotated with @config, and default value
  • Update TestElasticsearchConfig to include default false and explicit true mappings
presto-elasticsearch/src/main/java/com/facebook/presto/elasticsearch/ElasticsearchConfig.java
presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch/TestElasticsearchConfig.java
Adjust ElasticsearchQueryRunner to include case sensitivity and default schema config
  • Switch to ImmutableMap.Builder for connector properties
  • Conditionally add default schema name only when not overridden
  • Build and pass the finalized config map (newconfig) to createCatalog
presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch/ElasticsearchQueryRunner.java
Add mixed-case integration tests for connector behavior
  • Create TestElasticsearchMixedCaseTest with scenarios for SHOW columns, SELECT, and SHOW schemas
  • Verify connector respects caseSensitiveNameMatching=true and preserves mixed-case identifiers
presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch/TestElasticsearchMixedCaseTest.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@namya28 namya28 changed the title feat(Elasticsearch): Add mixed case support for Elasticsearch connector feat(elasticsearch): Add mixed case support for Elasticsearch connector Oct 17, 2025
@steveburnett
Copy link
Contributor

@namya28 namya28 changed the title feat(elasticsearch): Add mixed case support for Elasticsearch connector feat(connector): Enable case senstivity support in the Elasticsearch connector Oct 21, 2025
@namya28 namya28 force-pushed the ElasticsearchMixedcaseSupport branch from e62b68f to d6bcf3d Compare October 21, 2025 07:25
@namya28 namya28 changed the title feat(connector): Enable case senstivity support in the Elasticsearch connector feat(plugin-elasticsearch): Enable case senstivity support in the Elasticsearch connector Oct 21, 2025
@namya28 namya28 marked this pull request as ready for review October 21, 2025 12:04
@namya28 namya28 requested review from a team, elharo and steveburnett as code owners October 21, 2025 12:04
@prestodb-ci prestodb-ci requested review from a team, jkhaliqi and libianoss and removed request for a team October 21, 2025 12:04
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Change @config("case-sensitive-name-matching") to @config("elasticsearch.case-sensitive-name-matching") so the new setting uses the same prefix as other Elasticsearch connector properties.
  • In ElasticsearchQueryRunner you build the config map then immediately create a new variable newconfig for the same map; consider building and passing the map in one step to avoid the redundant local.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Change @Config("case-sensitive-name-matching") to @Config("elasticsearch.case-sensitive-name-matching") so the new setting uses the same prefix as other Elasticsearch connector properties.
- In ElasticsearchQueryRunner you build the config map then immediately create a new variable `newconfig` for the same map; consider building and passing the map in one step to avoid the redundant local.

## Individual Comments

### Comment 1
<location> `presto-elasticsearch/src/main/java/com/facebook/presto/elasticsearch/ElasticsearchConfig.java:59` </location>
<code_context>
     private boolean ignorePublishAddress;
     private boolean verifyHostnames = true;
     private Security security;
+    private boolean caseSensitiveNameMatching;

     @NotNull
</code_context>

<issue_to_address>
**suggestion:** Default value for caseSensitiveNameMatching is not set.

Initialize caseSensitiveNameMatching to false to clarify its default behavior and prevent ambiguity.

```suggestion
    private boolean caseSensitiveNameMatching = false;
```
</issue_to_address>

### Comment 2
<location> `presto-elasticsearch/src/test/java/com/facebook/presto/elasticsearch/TestElasticsearchMixedCaseTest.java:41` </location>
<code_context>
+                .setCaseSensitiveNameMatching(false));
     }

     @Test
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding negative tests for unsupported index name cases.

Add a test that uses uppercase or mixed-case index names to verify the connector returns a clear error or fails gracefully, ensuring compliance with Elasticsearch's lowercase index name requirement.

```suggestion
    @Test
    public void testUppercaseIndexNameFails() throws Exception
    {
        String invalidIndexName = "TestIndex";
        try {
            // Attempt to create an index with uppercase letters
            client.indices().create(new org.elasticsearch.client.indices.CreateIndexRequest(invalidIndexName), DEFAULT);
            // If no exception, fail the test
            assertTrue(false, "Expected exception for uppercase index name");
        }
        catch (org.elasticsearch.ElasticsearchStatusException e) {
            // Elasticsearch should reject uppercase index names
            assertTrue(e.getMessage().contains("Invalid index name"), "Error message should mention invalid index name");
        }
        catch (Exception e) {
            // Other exceptions may also be thrown, but should indicate invalid index name
            assertTrue(e.getMessage().toLowerCase().contains("invalid") || e.getMessage().toLowerCase().contains("uppercase"), "Error message should mention invalid index name or uppercase");
        }
    }

    @Test
```
</issue_to_address>

### Comment 3
<location> `presto-docs/src/main/sphinx/connector/elasticsearch.rst:55` </location>
<code_context>
 ``elasticsearch.max-http-connections``        Maximum number of persistent HTTP connections to Elasticsearch.
 ``elasticsearch.http-thread-count``           Number of threads handling HTTP connections to Elasticsearch.
 ``elasticsearch.ignore-publish-address``      Whether to ignore the published address and use the configured address.
+``case-sensitive-name-matching``              Enable case sensitive identifier support for schema and column names for the connector.
+                                              When disabled, names are matched case-insensitively using lowercase normalization.
+                                              Default is ``false``.
</code_context>

<issue_to_address>
**nitpick (typo):** Consider hyphenating 'case sensitive' to 'case-sensitive' for grammatical correctness.

As a compound adjective preceding 'identifier support', 'case-sensitive' is the correct form.

```suggestion
``case-sensitive-name-matching``              Enable case-sensitive identifier support for schema and column names for the connector.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

private boolean ignorePublishAddress;
private boolean verifyHostnames = true;
private Security security;
private boolean caseSensitiveNameMatching;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Default value for caseSensitiveNameMatching is not set.

Initialize caseSensitiveNameMatching to false to clarify its default behavior and prevent ambiguity.

Suggested change
private boolean caseSensitiveNameMatching;
private boolean caseSensitiveNameMatching = false;

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build.

Applied sourcery-ai suggestion to hyphenate case-sensitive, found no other concerns and am approving the doc. Thanks!

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants